This notebook shows how BigBang can help you explore a mailing list archive.

First, use this IPython magic to tell the notebook to display matplotlib graphics inline. This is a nice way to display results.

Import the BigBang modules as needed. These should be in your Python environment if you've installed BigBang correctly.


In [1]:
import bigbang.mailman as mailman
import bigbang.graph as graph
import bigbang.process as process
from bigbang.parse import get_date
#from bigbang.functions import *
from bigbang.archive import Archive


/home/sb/projects/bigbang-multi/bigbang/config/config.py:8: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  dictionary = yaml.load(stream)

Also, let's import a number of other dependencies we'll use later.


In [2]:
import pandas as pd
import datetime
import matplotlib.pyplot as plt
import numpy as np
import math
import pytz
import pickle
import os

Now let's load the data for analysis.


In [3]:
urls = ["ipython-dev"]

archives = [Archive(url,mbox=True) for url in urls]

activities = [arx.get_activity(resolved=False) for arx in archives]


/home/sb/projects/bigbang-multi/bigbang/bigbang/mailman.py:157: UserWarning: No mailing list name found at ipython-dev
  warnings.warn("No mailing list name found at %s" % url)

In [4]:
archives[0].data


Out[4]:
From Subject Date In-Reply-To References Body
Message-ID
<3E9DE124.8080309@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] Mailing lists indexed at gmane 2003-04-16 23:03:00+00:00 None None Hi all,\n\nafter a suggestion by Jacek Generow...
<3E9E4094.7030802@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] Re: Refactoring of bdist_wininst... 2003-04-17 05:50:12+00:00 <003d01c28a9a$3dcb8560$e301340a@cyberhigh.fcoe... <003d01c28a9a$3dcb8560$e301340a@cyberhigh.fcoe... Hi Cory,\n\n> Done. install command will now ...
<000c01c304ee$3cb79e60$e901340a@cyberhigh.fcoe.k12.ca.us> cdodt at fcoe.k12.ca.us (Cory Dodt) [IPython-dev] RE: Refactoring of bdist_wininst... 2003-04-17 14:32:56+00:00 <3E9E4094.7030802@colorado.edu> None Distutils 1.0.3 is not included with Python 2....
<3E9EC1CA.3060800@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] RE: Refactoring of bdist_wininst... 2003-04-17 15:01:30+00:00 <000c01c304ee$3cb79e60$e901340a@cyberhigh.fcoe... <000c01c304ee$3cb79e60$e901340a@cyberhigh.fcoe... Cory Dodt wrote:\n> Distutils 1.0.3 is not inc...
<3E9EF5E3.8080100@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] [Fwd: [ANN] A new IPython is out... 2003-04-17 18:43:47+00:00 None None Hi all,\n\nI've just put out a new pre-release...
<3E9EFC95.7040309@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] ToDo for 0.4.0 2003-04-17 19:12:21+00:00 None None Hi all,\n\nI'd like to put out a list of thing...
<3E9F3B79.7070005@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] New bug tracker for IPython 2003-04-17 23:40:41+00:00 None None Hi all,\n\nI just wanted to let you know that,...
<3E9F3D9B.8040807@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] Re: iPython on Windows 2003-04-17 23:49:47+00:00 <GCEDKONBLEFPPADDJCOECEOIIPAA.whisper@oz.net> <GCEDKONBLEFPPADDJCOECEOIIPAA.whisper@oz.net> Hi David,\n\nmy apologies for the long delay i...
<200304291817.05898.Kasper.Souren@ircam.fr> Kasper.Souren at ircam.fr (Kasper Souren) [IPython-dev] possible feature request: auto-run 2003-04-29 18:17:05+00:00 None None Hi!\n\nI just had a little idea for a new IPyt...
<3EAEF194.5030709@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] possible feature request: auto-run 2003-04-29 21:41:40+00:00 <200304291817.05898.Kasper.Souren@ircam.fr> <200304291817.05898.Kasper.Souren@ircam.fr> Kasper Souren wrote:\n> Hi!\n> \n> I just had ...
<200304292248.10994.Kasper.Souren@ircam.fr> Kasper.Souren at ircam.fr (Kasper Souren) [IPython-dev] possible feature request: auto-run 2003-04-29 22:48:10+00:00 <3EAEF194.5030709@colorado.edu> <200304291817.05898.Kasper.Souren@ircam.fr> <3... > It's rather complicated to get it right, and...
<CB0365D517B7D611B5E100508B9498B6022A9B50@erlh904a.med.siemens.de> christopher.drexler at siemens.com (Drexler Ch... [IPython-dev] RE: [Fwd: [IPython-user] re: Fwd... 2003-05-12 07:28:55+00:00 None None Dear List,\n\nI'm working with IPython since a...
<200305121234.h4CCYmXo027167@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] RE: [Fwd: [IPython-user] re: Fwd... 2003-05-12 12:34:48+00:00 None None Thanks Chris,\n\nWith that hint and some googl...
<3EC143D7.8050907@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] Re: IPython Crash Report 2003-05-13 19:13:27+00:00 <200305131849.h4DInjXo018909@wren.cs.unc.edu> <200305131849.h4DInjXo018909@wren.cs.unc.edu> Hi Gary,\n\n> The idea is simple. I assume tha...
<200305171149.h4HBneXo024735@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] re: 0.4.0 ready for Monday 2003-05-17 11:49:39+00:00 None None It still says it is 0.2.15.pre5, I guess that ...
<3ECAA865.9090109@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] [ANN] IPython 0.4.0 has been rel... 2003-05-20 22:12:53+00:00 None None Hi all,\n\nI've just released IPython 0.4.0. ...
<200305221924.h4MJOEXo018537@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] Small change to MagicCompleter f... 2003-05-22 19:24:14+00:00 None None Now that I've got my Python readline starting ...
<200305221927.h4MJR0Xo018707@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] Small change to MagicCompleter f... 2003-05-22 19:27:00+00:00 None None Ignore that previous patch. That code should g...
<200305221955.h4MJteXo020281@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] A patch to fix filename completi... 2003-05-22 19:55:40+00:00 None None OK, sorry for the first one. Here is another t...
<3ECD3E3F.8010205@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] A patch to fix filename completi... 2003-05-22 21:16:47+00:00 <200305221955.h4MJteXo020281@wren.cs.unc.edu> <200305221955.h4MJteXo020281@wren.cs.unc.edu> Gary Bishop wrote:\n> OK, sorry for the first ...
<16078.27403.669231.313029@monster.linux.in> prabhu at aero.iitm.ernet.in (Prabhu Ramachand... [IPython-dev] Making docs and installing from ... 2003-05-23 18:40:11+00:00 None None Hi,\n\nI just got IPython off CVS and when ins...
<200305232045.h4NKj9Xo019149@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] IPython bug? 2003-05-23 20:45:08+00:00 None None With readline mark-directories set to "on" (th...
<200305232058.h4NKwHXo019746@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] IPython with Color on windows 2003-05-23 20:58:17+00:00 None None I *can* make color work on Windows with my Pyt...
<200305232102.h4NL2LXo019939@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] What readline features do people... 2003-05-23 21:02:21+00:00 None None I've got a Python implementation of GNU readli...
<3ECE998E.4030107@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] Making docs and installing from ... 2003-05-23 21:58:38+00:00 <16078.27403.669231.313029@monster.linux.in> <16078.27403.669231.313029@monster.linux.in> Prabhu Ramachandran wrote:\n> Hi,\n> \n> I jus...
<3ECED168.9000603@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] IPython with Color on windows 2003-05-24 01:56:56+00:00 <200305232058.h4NKwHXo019746@wren.cs.unc.edu> <200305232058.h4NKwHXo019746@wren.cs.unc.edu> Gary Bishop wrote:\n> I *can* make color work ...
<3ECED1CD.7020109@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] What readline features do people... 2003-05-24 01:58:37+00:00 <200305232102.h4NL2LXo019939@wren.cs.unc.edu> <200305232102.h4NL2LXo019939@wren.cs.unc.edu> Gary Bishop wrote:\n> I've got a Python implem...
<3ECED292.1050307@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] IPython bug? 2003-05-24 02:01:54+00:00 <200305232045.h4NKj9Xo019149@wren.cs.unc.edu> <200305232045.h4NKj9Xo019149@wren.cs.unc.edu> Gary Bishop wrote:\n> With readline mark-direc...
<200305240204.h4O24nXo029923@wren.cs.unc.edu> gb at cs.unc.edu (Gary Bishop) [IPython-dev] IPython bug? 2003-05-24 02:04:49+00:00 None None On Fri, 23 May 2003 20:01:54 -0600 "Fernando P...
<3ECED502.4010703@colorado.edu> fperez at colorado.edu (Fernando Perez) [IPython-dev] IPython bug? 2003-05-24 02:12:18+00:00 <200305240204.h4O24nXo029923@wren.cs.unc.edu> <200305240204.h4O24nXo029923@wren.cs.unc.edu> Gary Bishop wrote:\n> \n>> Try removing '\' fr...
... ... ... ... ... ... ...
<CAJXewOm2UUH-AG-XO9QxwUg_xaw7eDNT+FH37=384Hak-XPi9g@mail.gmail.com> nathan12343 at gmail.com (Nathan Goldbaum) [IPython-dev] [ANN] 2019 Scipy Conference: Cal... 2019-01-08 23:12:37+00:00 None None SciPy 2019, the 18th annual Scientific Computi...
<ED156154-49C9-4FCA-BCB5-E6F8F8BD4A22@yank.to> andreas at yank.to (Andreas Yankopolus) [IPython-dev] Changing functions in a running ... 2019-01-12 23:07:20+00:00 None None Is it possible to run a program to a breakpoin...
<CACfEFw_QhB24j+NDkWjqgymcFxhV1i+YmVrQ2QNCpeQCxaVoFg@mail.gmail.com> wes.turner at gmail.com (Wes Turner) [IPython-dev] Changing functions in a running ... 2019-01-13 01:37:52+00:00 <ED156154-49C9-4FCA-BCB5-E6F8F8BD4A22@yank.to> <ED156154-49C9-4FCA-BCB5-E6F8F8BD4A22@yank.to> https://en.wikipedia.org/wiki/Monkey_patch#Pit...
<B8590D2C-0843-4717-B581-827F52D36191@yank.to> andreas at yank.to (Andreas Yankopolus) [IPython-dev] IPython-dev Digest, Vol 174, Iss... 2019-01-13 01:56:54+00:00 <mailman.3460.1547343485.4818.ipython-dev@pyth... <mailman.3460.1547343485.4818.ipython-dev@pyth... Wes,\n\n> https://stackoverflow.com/questions/...
<CACfEFw9X4dkWYh+TmH9FHfPOBQdnGAGCogOtOBR9=39EuAsOmg@mail.gmail.com> wes.turner at gmail.com (Wes Turner) [IPython-dev] Changing functions in a running ... 2019-01-13 02:42:18+00:00 <CACfEFw_QhB24j+NDkWjqgymcFxhV1i+YmVrQ2QNCpeQC... <ED156154-49C9-4FCA-BCB5-E6F8F8BD4A22@yank.to>... In searching for a solution, I found your ques...
<CAOvn4qi8ESV9r1Qnw9UsCNL97q5tG8zxh9eQ7KUgYGO=K3gcCQ@mail.gmail.com> takowl at gmail.com (Thomas Kluyver) [IPython-dev] Changing functions in a running ... 2019-01-13 08:13:04+00:00 <ED156154-49C9-4FCA-BCB5-E6F8F8BD4A22@yank.to> <ED156154-49C9-4FCA-BCB5-E6F8F8BD4A22@yank.to> Hi Andreas,\n\nIf you define a function or var...
<50B9EB6C-CBCB-46E3-AE8E-31FDD54CD568@yank.to> andreas at yank.to (Andreas Yankopolus) [IPython-dev] Changing functions in a running ... 2019-01-14 15:16:53+00:00 <mailman.1.1547398802.9508.ipython-dev@python.... <mailman.1.1547398802.9508.ipython-dev@python.... Thomas,\n\n> If you define a function or varia...
<CACfEFw8CbR7eSHjrCO2S22ZWDSCS01UBXDwhWLVnhKdPU7TEDw@mail.gmail.com> wes.turner at gmail.com (Wes Turner) [IPython-dev] Changing functions in a running ... 2019-01-14 16:02:38+00:00 <50B9EB6C-CBCB-46E3-AE8E-31FDD54CD568@yank.to> <mailman.1.1547398802.9508.ipython-dev@python.... There are likely more convenient patterns for ...
<5729286F-E250-412C-8776-BB8B4B6A3ABB@zoho.com> ToTheDude at zoho.com (TheDude) [IPython-dev] pixiedebugger: can't install 2019-01-27 04:22:12+00:00 None None Hello,\n\tI would really like to have the pixi...
<FD042477-AC19-4337-9C6D-4E406B523E33@zoho.com> ToTheDude at zoho.com (The Dude) [IPython-dev] pixiedebugger: can't install 2019-01-28 04:55:44+00:00 <CANADyYmj8Eng18g_qP1oi4M70nhdWzAADSn8E49CEpML... <5729286F-E250-412C-8776-BB8B4B6A3ABB@zoho.com... Hi Lisa,\n\tthanks for your help. \n\nUnfortun...
<CANADyYkLGdhoj7ubb=RfqF4NK0JUcNb88H2Mgd=hSJHgmQSHsw@mail.gmail.com> lisagbang at gmail.com (Lisa Bang) [IPython-dev] pixiedebugger: can't install 2019-01-28 15:08:59+00:00 <FD042477-AC19-4337-9C6D-4E406B523E33@zoho.com> <5729286F-E250-412C-8776-BB8B4B6A3ABB@zoho.com... Hi Dude,\n\nYes, it looks like when subprocess...
<20190215231032.GA30925@bluff.e-den.it> sandro.dentella at gmail.com (Alessandro Dente... [IPython-dev] kernel spect and how to thest them 2019-02-15 23:10:32+00:00 None None Hi,\n\nI'd like to bettere understand "argv" l...
<CAOvn4qiSpWmy4qJOHA7pkDqM7XpHaBf068X=F6ssA64MA0KvcA@mail.gmail.com> takowl at gmail.com (Thomas Kluyver) [IPython-dev] kernel spect and how to thest them 2019-02-16 10:46:39+00:00 <20190215231032.GA30925@bluff.e-den.it> <20190215231032.GA30925@bluff.e-den.it> Hi Sandro,\n\n> I'd like to bettere understand...
<CANJQusV1cttLB708kcvKQMo4z_KM+piFYveYUTN1od2fHnVa9g@mail.gmail.com> bussonniermatthias at gmail.com (Matthias Buss... [IPython-dev] IPython 7.4 out. 2019-02-21 02:00:17+00:00 None None See the full announce on discourse:\n\nhttps:/...
<20190221091828.GA11372@bluff.e-den.it> sandro at e-den.it (Alessandro Dentella) [IPython-dev] kernel spect and how to thest them 2019-02-21 09:18:28+00:00 <CAOvn4qiSpWmy4qJOHA7pkDqM7XpHaBf068X=F6ssA64M... <20190215231032.GA30925@bluff.e-den.it>\n <CAO... Thank you Thomas for the explanations that hel...
<20190221092828.GA18912@bluff.e-den.it> sandro.dentella at gmail.com (Alessandro Dente... [IPython-dev] many kernels, jupyter lab and re... 2019-02-21 09:28:28+00:00 None None Hi,\n\nAs I started to use jupyter notebooks I...
<CABRXM4ka=gNoYUQCZhCHaOWbJvZSWHt42AYKirrE_xiD4=iFcw@mail.gmail.com> cappy2112 at gmail.com (Tony Cappellini) [IPython-dev] Jupyterlab Terminal launches Pow... 2019-02-26 00:52:07+00:00 None None I've just clicked on the "terminal" in Jupyter...
<CANJQusW1NqrYA2WVUNXZLEkLK86NpX3yDX-u=JZ+x-x2cOcO6w@mail.gmail.com> bussonniermatthias at gmail.com (Matthias Buss... [IPython-dev] Release of IPython 7.4.0 2019-03-21 22:04:46+00:00 None None I just helped Akshay to make the release of IP...
<CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+y_bk7w@mail.gmail.com> john at fuzzdog.com (John Dey) [IPython-dev] ipython 7.7.0 install issues 2019-08-09 19:58:55+00:00 None None I'm building Python 3.7.4 with ipython7.7.0.\n...
<CANJQusVNUuJ7CE-Vjt3CyHNMQpp0+DJRPej5-z8hCcUvPZ=dQA@mail.gmail.com> bussonniermatthias at gmail.com (Matthias Buss... [IPython-dev] ipython 7.7.0 install issues 2019-08-09 21:27:49+00:00 <CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+... <CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+... How are you installing IPython ?\n\nWhen insta...
<CAAwU4WwmExoQc1d7W_PY4iyY4yNB6Wh7dWapk55a1srzLXR+7w@mail.gmail.com> john at fuzzdog.com (John Dey) [IPython-dev] ipython 7.7.0 install issues 2019-08-09 22:27:09+00:00 <CANJQusVNUuJ7CE-Vjt3CyHNMQpp0+DJRPej5-z8hCcUv... <CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+... I am building everything from source, over 700...
<CACejjWzE+zDoJAxCmhUXGxj8s3FqfVgPLh44iCOy-B1Ojh2P8w@mail.gmail.com> nick.bollweg at gmail.com (Nicholas Bollweg) [IPython-dev] ipython 7.7.0 install issues 2019-08-09 22:51:54+00:00 <CAAwU4WwmExoQc1d7W_PY4iyY4yNB6Wh7dWapk55a1srz... <CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+... Ipython, notebook, traitlets and genutils used...
<CANJQusXjVdsQQeEC=3fd0emAkE1PpCbbSFSfavG+MEdP2dxsXQ@mail.gmail.com> bussonniermatthias at gmail.com (Matthias Buss... [IPython-dev] ipython 7.7.0 install issues 2019-08-09 22:52:09+00:00 <CAAwU4WwmExoQc1d7W_PY4iyY4yNB6Wh7dWapk55a1srz... <CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+... Then how are you building ?\n\nIf you don't gi...
<CAAwU4WyEyVqoaxWhfKV6v+q-M5vJpSfajEPztgDfv5juS3xUUQ@mail.gmail.com> john at fuzzdog.com (John Dey) [IPython-dev] ipython 7.7.0 install issues 2019-08-12 19:26:47+00:00 <CACejjWzE+zDoJAxCmhUXGxj8s3FqfVgPLh44iCOy-B1O... <CAAwU4Wytt+XgONb+cZoTHQxd356TDJAfXygvRHVPDh5+... Thanks for your comment about GenUtils it was ...
<5838133d9a854fb8efd22882907d1f75@127.0.0.1> lists at moltenmercury.org (lists at moltenmer... [IPython-dev] Fancy icon for ipython.exe 2019-11-13 12:18:19+00:00 <VI1PR03MB2909C20073F0F6773F14478FB6470@VI1PR0... <VI1PR03MB2909C20073F0F6773F14478FB6470@VI1PR0... Good Morning,\n\n\nPlease see the attached doc...
<CAOvn4qhJAH=JpFuJKK_DRmEREmB5b-o8BkSuxsxequuLFjcg=w@mail.gmail.com> takowl at gmail.com (Thomas Kluyver) [IPython-dev] Fancy icon for ipython.exe 2019-11-13 13:14:09+00:00 <5838133d9a854fb8efd22882907d1f75@127.0.0.1> <VI1PR03MB2909C20073F0F6773F14478FB6470@VI1PR0... Ignore that zip file - it contains a Word doc...
<9143D0A4-1723-4CDE-8B25-11B25B079C41@gmail.com> m10fayed at gmail.com (Muhammad Fayed) [IPython-dev] How to import python 7.x into py... 2020-01-27 05:03:55+00:00 None None Hi,\nHope my mail find you well,\nFirst, I wan...
<CAMofdRDX_-i=gs8=dXCJs3wcOB8nO7NTFW4UV--=f=Jhc7928w@mail.gmail.com> steve at holdenweb.com (Steve Holden) [IPython-dev] How to import python 7.x into py... 2020-01-27 07:53:34+00:00 <9143D0A4-1723-4CDE-8B25-11B25B079C41@gmail.com> <9143D0A4-1723-4CDE-8B25-11B25B079C41@gmail.com> Hi there,\n\nThanks for your note. Unfortunate...
<4CB44FBB-2BD0-4DAC-B6F4-BD5DF184CAC3@gmail.com> m10fayed at gmail.com (Muhammad Fayed) [IPython-dev] How to import python 7.x into py... 2020-01-27 16:46:19+00:00 <CAMofdRDX_-i=gs8=dXCJs3wcOB8nO7NTFW4UV--=f=Jh... <9143D0A4-1723-4CDE-8B25-11B25B079C41@gmail.co... Sorry for my late reply,\nI?ve tried ?import I...
<CALGmxE+Qn8EChtf-fOi0ZFaK_XrvTr_yoa3Da+6_EeFo_db01w@mail.gmail.com> chris.barker at noaa.gov (Chris Barker - NOAA ... [IPython-dev] How to import python 7.x into py... 2020-01-27 16:59:54+00:00 <4CB44FBB-2BD0-4DAC-B6F4-BD5DF184CAC3@gmail.com> <4CB44FBB-2BD0-4DAC-B6F4-BD5DF184CAC3@gmail.com> I?m not sure I understand the question, but I ...

16328 rows × 6 columns

This variable is for the range of days used in computing rolling averages.


In [5]:
window = 100

For each of the mailing lists we are looking at, plot the rolling average of number of emails sent per day.


In [6]:
plt.figure(figsize=(12.5, 7.5))

for i, activity in enumerate(activities):

    colors = 'rgbkm'

    ta = activity.sum(1)
    rmta = ta.rolling(window).mean()
    rmtadna = rmta.dropna()
    plt.plot_date(np.array(rmtadna.index),
                  np.array(rmtadna.values),
                  colors[i],
                  label=mailman.get_list_name(urls[i]) + ' activity',
                  xdate=True)

    plt.legend()
    
plt.savefig("activites-marked.png")
plt.show()


/home/sb/projects/bigbang-multi/bigbang/bigbang/mailman.py:157: UserWarning: No mailing list name found at ipython-dev
  warnings.warn("No mailing list name found at %s" % url)
/home/sb/anaconda3/lib/python3.6/site-packages/pandas/plotting/_matplotlib/converter.py:102: FutureWarning: Using an implicitly registered datetime converter for a matplotlib plotting method. The converter was registered by pandas on import. Future versions of pandas will require you to explicitly register matplotlib converters.

To register the converters:
	>>> from pandas.plotting import register_matplotlib_converters
	>>> register_matplotlib_converters()
  warnings.warn(msg, FutureWarning)

Now, let's see: who are the authors of the most messages to one particular list?


In [7]:
a  = activities[0] # for the first mailing list
ta = a.sum(0) # sum along the first axis
ta.sort_values(ascending=True)[-10:].plot(kind='barh')


Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff6834a76d8>

This might be useful for seeing the distribution (does the top message sender dominate?) or for identifying key participants to talk to.


Many mailing lists will have some duplicate senders: individuals who use multiple email addresses or are recorded as different senders when using the same email address. We want to identify those potential duplicates in order to get a more accurate representation of the distribution of senders.

To begin with, let's do a naive calculation of the similarity of the From strings, based on the Levenshtein distance.

This can take a long time for a large matrix, so we will truncate it for purposes of demonstration.


In [8]:
import Levenshtein
distancedf = process.matricize(a.columns[:100], process.from_header_distance) # calculate the edit distance between the two From titles
df = distancedf.astype(int) # specify that the values in the matrix are integers

In [9]:
fig = plt.figure(figsize=(18, 18))
plt.imshow(df)
#plt.yticks(np.arange(0.5, len(df.index), 1), df.index) # these lines would show labels, but that gets messy
#plt.xticks(np.arange(0.5, len(df.columns), 1), df.columns)


Out[9]:
<matplotlib.image.AxesImage at 0x7ff6844b9cf8>

The dark blue diagonal is comparing an entry to itself (we know the distance is zero in that case), but a few other dark blue patches suggest there are duplicates even using this most naive measure.


In [ ]: